Getting ready

Set system locale

Sys.setlocale(category = "LC_ALL", locale = "Greek")

Load packages

library(here) # [CRAN] A replacement for 'file.path()', locating the files relative to the project root
library(tidyverse) # [CRAN] Easily install and load the 'Tidyverse'  
library(cowplot)  # [CRAN] Streamlined plot theme and plot annotations for 'ggplot2' 
library(RColorBrewer) # [CRAN] ColorBrewer Palettes
library(PerformanceAnalytics) # [CRAN] Collection of econometric functions for performance and risk analysis
library(MicrobeR) # [github::jbisanz/MicrobeR] Handy functions for downstream visualization fo microbiome data
library(phyloseq) # [Bioconductor] Handling and analysis of high-throughput microbiome census data 
library(factoextra) # [CRAN] Extract and visualize the results of multivariate data analyses 
library(lmerTest) # [CRAN] Fits and tests in linear mixed effects models 
library(lsr) # [CRAN] # Companion to "Learning Statistics with R" 
library(EMAtools) # [CRAN] # Data Management Tools for Real-Time Monitoring/Ecological Momentary Assessment Data 
library(DT) # [CRAN] An R interface to the DataTables library
library(Maaslin2)# [bitbucket::biobakery/maaslin2] Multivariate association analysis for microbiome data

Load functions

source(here("code", "functions", "maaslin2_heatmap.R"))

Import data

load(here("data", "qiime2R", "phyloseq.RData"))

# Extract feature table, taxonomy and metadata from the phyloseq object
count_tab <- as.data.frame(otu_table(phyloseq)) 
tax_tab <- tax_table(phyloseq) %>% as("matrix") %>% as.data.frame()
metadata <- data.frame(sample_data(phyloseq), check.names = FALSE) 

Tidy sample metadata

metadata <- metadata %>% 
  rownames_to_column() %>% 
  # hyphens in column names cause problems. We use underscore instead
  rename_all(~gsub("-", "_", .x)) %>%
  # convert FishID and NetPen to character so that they can be correctly recognized as random effects
  mutate(FishID = as.character(FishID), NetPen = as.character(NetPen))

Collapse feature table at genus level

First of all, we get the best taxonomic annotation for each feature. Starting at the phylum level, a higher level taxonomic rank will be assigned to a lower level taxonomic rank if the lower taxonomic rank is null or contains strings like “uncultured”, “Ambiguous”, or “metagenome”.

tab_species <- Summarize.Taxa(count_tab, tax_tab)$Species %>%
  rownames_to_column("tax") %>%
  separate(tax, sep = ";", c("Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species")) %>%
  # the following 5 lines format the taxonomy of each feature to the best hit
  mutate(Class = ifelse(Class == "NA"|grepl("uncultured|Ambiguous|metagenome", Class), Phylum, Class),
         Order = ifelse(Order == "NA"|grepl("uncultured|Ambiguous|metagenome", Order), Class, Order),
         Family = ifelse(Family == "NA"|grepl("uncultured|Ambiguous|metagenome", Family), Order, Family),
         Genus = ifelse(Genus == "NA"|grepl("uncultured|Ambiguous|metagenome", Genus), Family, Genus),
         Species = ifelse(Species == "NA"|grepl("uncultured|Ambiguous|metagenome", Species), Genus, Species)) %>%
  select(-(Kingdom:Family)) 

Then we filter genera consisting of only one species in the present data set.

index <- tab_species %>%
  filter(grepl("g__", Genus)) %>%
  group_by(Genus) %>%
  nest() %>%
  mutate(nrow = map_int(data, ~nrow(.x))) %>%
  filter(nrow == 1) %>%
  unnest() %>%
  filter(grepl("s__", Species)) 

For genera consisting of only one species, we replace genus names with species names.

tab_genus <- tab_species %>%
  mutate(Genus = ifelse(grepl(paste(index$Genus, collapse = "|"), Genus), Species, Genus)) %>%
  select(-Species) %>%
  # remove the prefix from genus/species names
  mutate_if(is.character, ~gsub("g__|s__", "", .x)) %>%
  # the following 2 lines merge rows with the same taxonomy
  group_by(Genus) %>%
  summarise_all(sum) %>%
  column_to_rownames("Genus")

Multivariate association analysis

For the multivariate association analysis, we’ll treat FishID and NetPen as random effects. Diet and Sample origin are the two main fixed effects we’re interested in. We’re also interested in potential associations between the microbial clades and other sample metadata collected from the same fish. Due to the small number of observations, we’ll limit the association tests to some of the sample metadata, including the distal intestine weight (DISI), histological scores of the distal intestine and expression levels of genes indicative of immune responses and barrier functions.

Exploration of histological scores

Histological examination of distal intestine showed various degrees of inflammation in different fish with no obivious diet effects. The morphological changes resemble those commonly observed in salmonid fed soybean meal as illustated below: shortening and fusion of mucosal folds, cellular infiltration within the lamina propria and submucosa, reduced supravacuolization within enterocytes and nucleus position disparity. Specifically, the histological sections were evaluated paying attentions to the changes in the mucosal fold height (mfh), supravacuolization within enterocytes (snv), submucosal cellularity (smc) and lamina propria cellularity (lpc).

As shown below, histological scores under different evaluation categories for the same fish agree with each other in most cases. The mucosal fold height (mfh) and supravacuolization within enterocytes (snv) showed the least and most variable changes, whereas cellular infiltration within the submucosa (smc) and lamina propria (lpc) showed intermediate level of changes. To avoid the multicollinearity, we’ll select one of them for the association analysis.

# Get histological scores from the metadata
df_histo <- filter(metadata, SampleOrigin == "Mucosa") %>% 
  select(FishID, contains("Histo")) %>%
  rename_all(~gsub("Histology_", "", .x)) %>%
  gather(key = "item", value = "score", -FishID) %>%
  mutate(item = factor(item, c("mfh", "smc", "lpc", "snv")), score = factor(score, unique(.$score))) 

# Plot histological scores for each fish
ggplot(df_histo, aes(x = item, y = score)) +
  geom_point(colour = "blue") +
  geom_line(aes(group = 1), colour = "blue") +
  facet_wrap(~FishID, ncol = 6) +
  labs(x = "Histological characteristics evaluated", y = "Histological scores") +
  cowplot::theme_cowplot() +
  theme(panel.grid.major.y = element_line(size = 0.5, linetype = 'dashed', colour = "black"))

The distibution of histological scores under each evaluation category can be quite unbalanced, which is chanlleging for running a proper association analysis. Let’s look at the histogram of histological scores under each evaluation category.

ggplot(df_histo, aes(score)) +
    geom_histogram(stat = "count") +
    facet_wrap(~item) +
    scale_y_continuous(limits = c(0, 30), breaks = 0:6*5, expand = expansion(mult = c(0, 0.05))) +
    theme_cowplot() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1),
          panel.grid.major.y = element_line(size = 0.5, linetype = 'dashed', colour = "black")) 

The distibution of histological scores was indeed quite unbalanced. We’ll aggregate histological scores to binary data (normal VS. abnormal) so that data are more balanced. As microbiome data are sparse, aggregation of histologcial scores also increases the number of non-zero observations within each group, making association tests more reliable.

metadata <- metadata %>%
  mutate_at(vars(contains("Histo")), ~gsub("Mild|Moderate|Marked|Severe", "Abnormal", .x)) %>%
  mutate_at(vars(contains("Histo")), ~factor(.x, levels = c("Normal", "Abnormal")))

Let’s look at the distibution of histological scores after data aggregation.

filter(metadata, SampleOrigin == "Mucosa") %>% 
  select(FishID, contains("Histo")) %>%
  rename_all(~gsub("Histology_", "", .x)) %>%
  gather(key = "item", value = "score", -FishID) %>%
  mutate(item = factor(item, c("mfh", "smc", "lpc", "snv")), score = factor(score, unique(.$score))) %>% 
  ggplot(aes(score)) +
      geom_histogram(stat="count") +
      facet_wrap(~item) +
      scale_y_continuous(limits = c(0, 30), breaks = 0:6*5, expand = expansion(mult = c(0, 0.05))) +
      theme_bw(base_size = 14) +
      theme(panel.grid.major.y = element_line(size = 0.5, linetype = 'dashed', colour = "black")) 

After data aggregation, lpc and snv showed a much balanced distribution. As cellular infiltration within the lamina propria (lpc) is a direct indication of gut inflammation, we’ll use it for the association analysis.

Exploration of gene expression

Immune reponses

In addition to histolgocial examination, some of the marker genes of the gut inflammation in Atlantic salmon were also profiled by qPCR assays. Let’s plot the expression levels of these genes in fish assigned as normal or abnormal regarding the cellular infiltration within the lamina propria (lpc).

# Get the gene expression data on immune responses and convert to the long format
metadata_imm <- filter(metadata, SampleOrigin == "Mucosa") %>%
  rename_all(~gsub("qPCR_", "", .x)) %>%
  rename_all(~gsub("Histology_", "", .x)) %>%
  select(NetPen, lpc, myd88:il4) %>%
  gather(gene, quantity, -c(lpc, NetPen)) %>%
  mutate(gene = factor(gene, unique(.$gene)), lpc = factor(lpc, unique(.$lpc)))
  
# Make boxplots showing difference in the gene expression level under different histological scores
ggplot(metadata_imm, aes(x = lpc, y = quantity)) +
  geom_boxplot(outlier.shape = NA) +
  geom_point(aes(fill = lpc), size = 2, shape = 21, position = position_jitterdodge(0.2)) +
  facet_wrap(~gene, ncol = 4, scales = "free_y") +
  scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.1))) + 
  labs(x = "Histological score of lamina propria cellularity", y = "Quantity") +
  theme_bw(base_size = 14) +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.position = "none")

Next, let’s find out differential expressed genes.

de_imm <- metadata_imm %>%
  group_by(gene) %>% 
  nest() %>% 
  # fit linear mixed effects models
  mutate(lmm    = map(data, ~lmer(log(quantity) ~ lpc + (1|NetPen), data = .x)), 
         # mark models with singular fits
         sf     = map_chr(lmm, ~ifelse(is.null(.x@optinfo$conv$lme4$messages), "no", "yes")),
         # significance testing for the fixed effect using Kenward-Roger approximation
         anv    = map(lmm, ~anova(.x, ddf = "Kenward-Roger")),
         p_lmm  = map_dbl(anv, ~.x["lpc", "Pr(>F)"]),
         # get effect size: cohen's D
         es_lmm = map2_dbl(lmm, data, ~lme.dscore(mod = .x, data = .y, type = "lme4")[1, "d"]),
         # run Welch t-test
         lm     = map(data, ~t.test(log(quantity) ~ lpc, data = .x)),
         p_lm   = map_dbl(lm, ~.x$p.value),
         # get effect size: cohen's D
         es_lm  = map_dbl(data, ~cohensD(log(quantity) ~ lpc, data = .x, , method = "unequal"))) %>%
  select(gene, sf, p_lmm, p_lm, es_lmm, es_lm) %>%
  ungroup() %>%
  mutate(p = ifelse(sf == "no", p_lmm, p_lm),
         p_adj = p.adjust(p, method = "fdr"),
         "Cohen's D" = ifelse(sf == "no", es_lmm, es_lm)) %>%
  filter(p_adj < 0.05) %>%
  mutate_if(is.numeric, ~ifelse(.x < 0.001, "< 0.001", round(.x, 3)))
  
datatable(de_imm, options = list(columnDefs = list(list(className = 'dt-left', targets = c(1:9))))) 

The expression levels of these maker genes are often correlated. Let’s check the correlation between the differential expressed genes.

# Subset metadata to contain differential expressed genes only
metadata_immSig <- filter(metadata, SampleOrigin == "Mucosa") %>%
  column_to_rownames("FishID") %>%
  rename_all(~gsub("qPCR_", "", .x)) %>%
  select(as.character(de_imm$gene))

# Make correlation plot
chart.Correlation(metadata_immSig, histogram = TRUE, pch = 19)

To avoid multicollinearity and to reduce the number of association testing, we’ll run principle component analysis (PCA) and use the first principle component (PC) for the association analysis, which explains most of the variance in the data. As the expression levels of some of the genes are quite skewed, we’ll log transform the data before running PCA.

# Log transform the data
metadata_immSigLog <- rownames_to_column(metadata_immSig, "FishID") %>%
  mutate_if(is.numeric, ~log(.x)) %>%
  column_to_rownames("FishID")

# Check the data distribution after data transformation
chart.Correlation(metadata_immSigLog, histogram = TRUE, pch = 19)

Now we run the PCA. We can see that the PC1 explains most of the variance.

# Principle component analysis
pca_imm <- prcomp(metadata_immSigLog, scale = TRUE)

# Scree plot
fviz_eig(pca_imm)

Plot contributions of variables on the PC1.

fviz_contrib(pca_imm, choice="var", axes = 1)

PCA biplot.

# Subset metadata for the biplot annotation
pca_ann <- filter(metadata, SampleOrigin == "Mucosa") 

# calculate variance 
eigs_imm <- pca_imm$sdev^2

# make biplot  
fviz_pca_biplot(
  pca_imm, 
  select.var = list(name = NULL, cos2 = NULL, contrib = 9),
  col.ind = as.factor(pca_ann$Histology_lpc),
  palette = "Dark2",
  pointsize = 2, 
  invisible = "quali", # hide centroid
  repel = TRUE,
  label = "var",
  title = "",
  legend.title = "Lamina propria cellularity") +
  labs(x = paste0("PC1 (", round(100 * eigs_imm[1] / sum(eigs_imm), 1), "%)"), 
       y = paste0("PC2 (", round(100 * eigs_imm[2] / sum(eigs_imm), 1), "%)"))

Extract PC1.

imm_pc1 <- pca_imm$x %>% 
  as.data.frame() %>% 
  select(PC1) %>%
  rename(qPCR_immune_response = PC1) %>%
  rownames_to_column("FishID") %>%
  uncount(2)

Barrier functions

As shown below, the expression of barrier function relevant genes showed strong correlations.

# Subset metadata 
metadata_barr <- filter(metadata, SampleOrigin == "Mucosa") %>%
  column_to_rownames("FishID") %>%
  rename_all(~gsub("qPCR_", "", .x)) %>%
  select(muc2:cdh1)

# Correlation plot
chart.Correlation(metadata_barr, histogram = TRUE, pch = 19)

Again, we’ll run PCA and use the PC1 for the association analysis.

pca_barr <- prcomp(metadata_barr, scale = TRUE)

# Scree plot
fviz_eig(pca_barr)

Plot contributions of variables on the PC1

fviz_contrib(pca_barr, choice="var", axes = 1)

PCA biplot

# calculate variance 
eigs_barr <- pca_barr$sdev^2

# make pca biplot
fviz_pca_biplot(
  pca_barr, 
  select.var = list(name = NULL, cos2 = NULL, contrib = 5),
  col.ind = as.factor(pca_ann$Histology_lpc),
  palette = "Dark2",
  pointsize = 2, 
  invisible = "quali", 
  repel = TRUE,
  label = "var",
  title = "",
  legend.title = "Lamina propria cellularity") +
  labs(x = paste0("PC1 (", round(100 * eigs_barr[1] / sum(eigs_barr), 1), "%)"), 
       y = paste0("PC2 (", round(100 * eigs_barr[2] / sum(eigs_barr), 1), "%)"))

Extract PC1.

barr_pc1 <- pca_barr$x %>% 
  as.data.frame() %>% 
  select(PC1) %>%
  rename(qPCR_barrier_function = PC1) %>%
  rownames_to_column("FishID") %>%
  uncount(2)

Run MaAsLin2

Add the extracted principle components to the metadata.

metadata <- bind_cols(metadata, imm_pc1, barr_pc1) %>%
  mutate(Diet = factor(Diet, c("REF", "IM"))) %>%
  rename(Sample_origin = SampleOrigin) %>%
  column_to_rownames("rowname")
  
# Check if the rows were correctly matched during the table merging
identical(metadata$FishID, metadata$FishID1)
## [1] TRUE
identical(metadata$FishID, metadata$FishID2)    
## [1] TRUE

Define the fixed and random effects of interest.

# Fixed effects
fixef <- c("Diet", "Sample_origin", "DISI", "Histology_lpc", "qPCR_immune_response", "qPCR_barrier_function")

# Random effects
ranef <- c("FishID", "NetPen")

Run multivariate association analysis.

fit <- Maaslin2(input_data = tab_genus, 
                input_metadata = metadata,
                output = here("data", "maaslin2"),
                min_abundance = 0.0001,
                min_prevalence = 0.25,
                max_significance = 0.25,
                normalization = "TSS",
                transform = "LOG",
                analysis_method = "LM",
                random_effects = ranef,
                fixed_effects = fixef,
                correction = "BH",
                standardize = TRUE,
                plot_heatmap = TRUE,
                plot_scatter = TRUE)

Customize MaAsLin2 outputs

Prepare data

Merge MaAsLin2 outputs, feature table and metadata for plotting.

# Get features with significant associations for each metadata tested
sig_result <- filter(fit$results, qval <= 0.25) %>%
  mutate(N.not.zero = paste0(N.not.zero, "/", N)) %>%
  arrange(feature)

# Replicate metadata rows 
metadata_rep <- rownames_to_column(metadata, "SampleID") %>%
  uncount(nrow(sig_result)) %>%
  arrange(SampleID)

# Merge the collapsed feature table, metadata and features with significant associations  
tab_sigAss <- tab_genus %>%
  rownames_to_column("feature") %>%
  mutate_if(is.numeric, ~.x/sum(.x)) %>%
  filter(feature %in% unique(sig_result$feature)) %>%
  left_join(count(sig_result, feature), by = "feature") %>%
  uncount(n) %>%
  arrange(feature) %>% 
  bind_cols(sig_result, .) %>%
  gather(SampleID, abundance, "AqFl2-01":"AqFl2-72") %>%
  arrange(SampleID) %>%
  bind_cols(metadata_rep, .) %>%
  mutate(qval = ifelse(qval < 0.001, "< 0.001", round(qval, 3)), 
         coef = formatC(coef, format = "e", digits = 1), 
         ann_boxplot = paste("FDR:", qval),
         ann_scatter = paste(paste("FDR:", qval), 
                             "\n", 
                             paste("coefficient:", coef), 
                             "\n", 
                             paste("N.not.zero:", N.not.zero))) 

Heatmap

Make a customized heatmap using a modified function from the MaAsLin2 package.

hmp <- maaslin2_heatmap(output_results = here("data", "maaslin2", "all_results.tsv"),
                        first_n = length(unique(sig_result$feature)), # plot all significant associations
                        cell_value = "qval",
                        data_label = "data",
                        metadata_label = "metadata",
                        title = FALSE,
                        legend_title = FALSE,
                        legend_title_position = "leftcenter-rot", # topleft, topcenter, leftcenter-rot or lefttop-rot
                        color = c("blue", "grey90", "red"),
                        board_line_col = "white",
                        colnames_rotate = 90,
                        colnames_fontsize = 12,
                        rownames_fontsize = 10,
                        italize_rownames = TRUE)
hmp

Boxplot

Sample origin

Make boxplot for sample origin effect, color points by Diet.

# Add number of non-zero observation to text annotation
ann_origin <- filter(tab_sigAss, metadata == "Sample_origin" ) %>%
  group_by(feature, Sample_origin) %>%
  summarize(N = n(), N_nonzero_origin = paste0(sum(abundance != 0), "/", N)) %>%
  mutate(ann_origin = paste0(paste0("N.not.zero(", Sample_origin, "): ", N_nonzero_origin))) %>%
  group_by(feature) %>%
  summarize(n = sum(N), ann_origin = paste(ann_origin, collapse = "\n")) %>%
  uncount(n) %>%
  arrange(feature) %>%
  select(feature, ann_origin)

# Make a dataframe for plotting
sigAss_origin <- filter(tab_sigAss, metadata == "Sample_origin" ) %>%
  arrange(feature) %>%
  bind_cols(ann_origin) %>%
  mutate(ann_boxplot = paste0(ann_boxplot, "\n", ann_origin)) 
  
# Make plots
ggplot(sigAss_origin, aes(x = Sample_origin, y = abundance)) +
  geom_boxplot(width = 0.5, outlier.shape = NA) +
  geom_jitter(aes(colour = Diet), shape = 16, position = position_jitter(0.2)) +
  geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 2.5, hjust = 1, vjust = 1.05) +
  facet_wrap(~feature, ncol = 5, scales = "free_y") +
  scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.6)), 
                     labels = scales::percent_format(accuracy = 0.1)) +   
  labs(x = "Sample origin", y = "Relative abundance") + 
  scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(3,4)]) +
  theme_bw() +
  theme(legend.position = "top")

Diet

Make boxplot for Diet effect, color points by sample origin.

# Add number of non-zero observation to text annotation
ann_diet <- filter(tab_sigAss, metadata == "Diet" ) %>%
  group_by(feature, Diet) %>%
  summarize(N = n(), N_nonzero_diet = paste0(sum(abundance != 0), "/", N)) %>%
  mutate(ann_diet = paste0(paste0("N.not.zero(", Diet, "): ", N_nonzero_diet))) %>%
  group_by(feature) %>%
  summarize(n = sum(N), ann_diet = paste(ann_diet, collapse = "\n")) %>%
  uncount(n) %>%
  arrange(feature) %>%
  select(feature, ann_diet)

# Make a dataframe for plotting 
sigAss_diet <- filter(tab_sigAss, metadata == "Diet" ) %>%
  arrange(feature) %>%
  bind_cols(ann_diet) %>%
  mutate(ann_boxplot = paste0(ann_boxplot, "\n", ann_diet))

# Make plots
ggplot(sigAss_diet, aes(x = Diet, y = abundance)) +
  geom_boxplot(width = 0.5, outlier.shape = NA) +
  geom_jitter(aes(colour = Sample_origin), shape = 16, position = position_jitter(0.2)) +
  geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
  facet_wrap(~feature, ncol = 5, scales = "free_y") +
  scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.6)), 
                     labels = scales::percent_format(accuracy = 0.1)) +   
  labs(y = "Relative abundance", colour = "Sample origin") +
  scale_fill_brewer(palette = "Dark2") +
  theme_bw() +
  theme(legend.position = "top")

Histology

Make boxplot for hisstological scores, color points by Sample origin.

# Add number of non-zero observation to text annotation
ann_lpc <- filter(tab_sigAss, metadata == "Histology_lpc" ) %>%
  group_by(feature, Histology_lpc) %>%
  summarize(N = n(), N_nonzero_lpc = paste0(sum(abundance != 0), "/", N)) %>%
  mutate(ann_lpc = paste0(paste0("N.not.zero(", Histology_lpc, "): ", N_nonzero_lpc))) %>%
  group_by(feature) %>%
  summarize(n = sum(N), ann_lpc = paste(ann_lpc, collapse = "\n")) %>%
  uncount(n) %>%
  arrange(feature) %>%
  select(feature, ann_lpc)

# Make a dataframe for plotting
sigAss_lpc <- filter(tab_sigAss, metadata == "Histology_lpc" ) %>%
  arrange(feature) %>%
  bind_cols(ann_lpc) %>%
  mutate(ann_boxplot = paste0(ann_boxplot, "\n", ann_lpc)) 

# Make plots
ggplot(sigAss_lpc, aes(x = Histology_lpc, y = abundance)) +
  geom_boxplot(width = 0.5, outlier.shape = NA) +
  geom_jitter(aes(colour = Sample_origin), shape = 16, position = position_jitter(0.2)) +
  geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
  facet_wrap(~feature, ncol = 2, scales = "free_y") +
  scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)), 
                     labels = scales::percent_format(accuracy = 0.1)) +   
  labs(x = "Lamina propria cellularity in distal gut", y = "Relative abundance", 
       colour = "Sample origin") +  
  scale_fill_brewer(palette = "Dark2") +
  theme_bw()

Scatter plot

DISI

Make scatter plot for DISI, color points by sample origin.

filter(tab_sigAss, metadata == "DISI" ) %>%
  ggplot(aes(x = DISI, y = abundance)) +
    geom_point(aes(colour = Sample_origin), size = 2) +
    geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
    facet_wrap(~feature, ncol = 2, scales = "free_y") +
    scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)), 
                       labels = scales::percent_format(accuracy = 0.1)) +   
    labs(y = "Relative abundance", colour = "Sample origin") +   
    scale_fill_brewer(palette = "Dark2") +
    theme_bw()

qPCR_immune_response

Make scatter plot for PC1 of immune response genes, color points by sample origin. Note that as the PC1 increases, the expression levels of immune response genes decrease (see PCA biplot above). Hence, a positive coefficient from the MaAsLin2 outputs is indicative of negative correlation between the microbial clades and expression levels of immune genes, and vice versa.

filter(tab_sigAss, metadata == "qPCR_immune_response" ) %>%
  ggplot(aes(x = qPCR_immune_response, y = abundance)) +
    geom_point(aes(colour = Sample_origin), size = 2) +
    geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
    facet_wrap(~feature, ncol = 3, scales = "free_y") +
    scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)), 
                       labels = scales::percent_format(accuracy = 0.1)) +   
    labs(x = "qPCR: immune responses in distal gut (PC1 of PCA)", y = "Relative abundance",
         colour = "Sample origin") +  
    scale_fill_brewer(palette = "Dark2") +
    theme_bw()

qPCR_barrier_function

Make scatter plot for PC1 of barrier function relevant genes, color points by sample origin. Note that as the PC1 increases, the expression levels of barrier function relevant genes decrease (see PCA biplot above). Hence, a positive coefficient from the MaAsLin2 outputs is indicative of negative correlation between the microbial clades and expression levels of barrier function relevant genes, and vice versa.

filter(tab_sigAss, metadata == "qPCR_barrier_function" ) %>%
  ggplot(aes(x = qPCR_barrier_function, y = abundance)) +
    geom_point(aes(colour = Sample_origin), size = 2) +
    geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
    facet_wrap(~feature, ncol = 3, scales = "free_y") +
    scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)), 
                       labels = scales::percent_format(accuracy = 0.1)) +   
    labs(x = "qPCR: barrier function in distal gut (PC1 of PCA)", y = "Relative abundance",
         colour = "Sample origin") +  
    scale_fill_brewer(palette = "Dark2") +
    theme_bw()

Highlight MaAsLin2 outputs

Highlight sample origin effects

p_origin <- sigAss_origin %>% 
  filter(feature %in% c("Brevinema andersonii", "f__Spirochaetaceae")) %>%
  mutate(labs = ifelse(grepl("__", feature), paste0("bold(", feature, ")"), paste0("bolditalic(", feature, ")")),
         labs = gsub("\\s+", "~", labs)) %>%
  mutate(feature = factor(feature, labels = unique(labs))) %>%
  ggplot(aes(x = Sample_origin, y = abundance)) +
  geom_boxplot(outlier.shape = NA, width = 0.3) +
  geom_jitter(aes(colour = Diet), shape = 16, position = position_jitter(0.1)) +
  geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
  facet_wrap(~feature, ncol = 2, scales = "free_y", labeller = label_parsed) +
  scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.5)), 
                     breaks = 0:5*0.2, labels = scales::percent_format(accuracy = 1)) +
  scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(3,4)]) +
  labs(x = "Sample origin", y = "Relative abundance") + 
  theme_bw() +
  theme(legend.margin = margin(0, 0, 0, 0),
        legend.box.margin = margin(-5, -5, -5, -5))

p_origin

Highlight diet effects

p_diet <- sigAss_diet %>% 
  filter(feature %in% c("Bacillus", "Corynebacterium 1")) %>%
  ggplot(aes(x = Diet, y = abundance)) +
    geom_boxplot(outlier.shape = NA, width = 0.3) +
    geom_jitter(aes(colour = Sample_origin), shape = 16, position = position_jitter(0.1)) +
    geom_text(aes(x = Inf, y = Inf, label = ann_boxplot), size = 3, hjust = 1, vjust = 1.05) +
    facet_wrap(~feature, ncol = 2, scales = "free_y") +
    scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.6)),
                       labels = scales::percent_format(accuracy = 1)) +  
    scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(1,2)]) +
    labs(color = "Sample origin", y = "Relative abundance") +
    theme_bw() +
    theme(strip.text = element_text(face = "bold.italic"),
          legend.margin = margin(0, 0, 0, 0),
          legend.box.margin = margin(-5, -5, -5, -5))
p_diet

Highlight the significant association between Brevinema andersonii and the immune gene expression levels.

p_imm <- tab_sigAss %>%
  filter(metadata == "qPCR_immune_response", feature == "Brevinema andersonii") %>%
  ggplot(aes(x = qPCR_immune_response, y = abundance)) +
    geom_point(aes(colour = Sample_origin), size = 2) +
    geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
    facet_wrap(~feature, scales = "free_y") +
    scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.3)), 
                       labels = scales::percent_format(accuracy = 1)) +   
    labs(x = "qPCR: immune responses (PC1 of PCA)", y = "Relative abundance",
         colour = "Sample origin") +  
    scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(1,2)]) +
    theme_bw() +
    theme(strip.text = element_text(face = "bold.italic"),
          legend.margin = margin(0, 0, 0, 0),
          legend.box.margin = margin(-5, -5, -5, -5))

p_imm

Highlight the significant association between the unclassified Spirochaetaceae and the barrier gene expression levels.

p_barr <- tab_sigAss %>%
  filter(metadata == "qPCR_barrier_function", feature == "f__Spirochaetaceae") %>%
  ggplot(aes(x = qPCR_barrier_function, y = abundance)) +
    geom_point(aes(colour = Sample_origin), size = 2) +
    geom_text(aes(x = Inf, y = Inf, label = ann_scatter), size = 3, hjust = 1, vjust = 1.05) +
    facet_wrap(~feature, scales = "free_y") +
    scale_y_continuous(limits = c(0, NA), expand = expansion(mult = c(0, 0.35)), 
                       labels = scales::percent_format(accuracy = 1)) +   
    labs(x = "qPCR: barrier function (PC1 of PCA)", y = "Relative abundance",
         colour = "Sample origin") +  
    scale_color_manual(values = brewer.pal(n = 8, name = "Dark2")[c(1,2)]) +
    theme_bw() +
    theme(strip.text = element_text(face = "bold"),
          legend.margin = margin(0, 0, 0, 0),
          legend.box.margin = margin(-5, -5, -5, -5))

p_barr

Assemble plots

# Assemble ggplots
ind <- plot_grid(p_origin, p_diet, p_imm, p_barr, align = "v", axis = "lr", 
                 ncol = 1, labels = c("B", "C", "D", "E"))

# Convert the heatmap (made by the ComplexHeatmap) into a "grob" object
hmp_grob <- grid.grabExpr(draw(hmp, heatmap_legend_side = "left")) 

# Combine the heatmap and ggplots
plot_grid(hmp_grob,
          ind, 
          rel_widths = c(1, 1.55),
          labels = c("A", ""),
          ncol = 2) 

# Export the assembled plot
ggsave(here("result", "figures", "Figure 6.tiff"), width = 9, height = 10,
       units = "in", dpi = 300, compression = "lzw")

Session information

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=nb_NO.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=nb_NO.UTF-8   
##  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=nb_NO.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=nb_NO.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ComplexHeatmap_2.0.0       circlize_0.4.8             Maaslin2_0.99.12           DT_0.11                    EMAtools_0.1.3            
##  [6] lsr_0.5                    lmerTest_3.1-1             lme4_1.1-21                Matrix_1.2-18              factoextra_1.0.6          
## [11] phyloseq_1.28.0            MicrobeR_0.3.1             PerformanceAnalytics_1.5.3 xts_0.12-0                 zoo_1.8-7                 
## [16] RColorBrewer_1.1-2         cowplot_1.0.0              forcats_0.4.0              stringr_1.4.0              dplyr_0.8.5               
## [21] purrr_0.3.3                readr_1.3.1                tidyr_1.0.2                tibble_2.1.3               ggplot2_3.3.0             
## [26] tidyverse_1.3.0            here_0.1                   knitr_1.27                
## 
## loaded via a namespace (and not attached):
##   [1] tidyselect_1.0.0      RSQLite_2.2.0         htmlwidgets_1.5.1     munsell_0.5.0         codetools_0.2-16      effectsize_0.2.0     
##   [7] withr_2.1.2           colorspace_1.4-1      Biobase_2.44.0        rstudioapi_0.10       stats4_3.6.3          ggsignif_0.6.0       
##  [13] labeling_0.3          emmeans_1.4.4         optparse_1.6.4        lpsymphony_1.12.0     pheatmap_1.0.12       bit64_0.9-7          
##  [19] farver_2.0.3          rhdf5_2.28.1          rprojroot_1.3-2       coda_0.19-3           vctrs_0.2.2           treeio_1.8.1         
##  [25] generics_0.0.2        TH.data_1.0-10        xfun_0.12             R6_2.4.1              clue_0.3-57           philr_1.10.1         
##  [31] assertthat_0.2.1      promises_1.1.0        scales_1.1.0          multcomp_1.4-12       gtable_0.3.0          phangorn_2.5.5       
##  [37] sandwich_2.5-1        rlang_0.4.4           GlobalOptions_0.1.1   splines_3.6.3         lazyeval_0.2.2        broom_0.5.4          
##  [43] BiocManager_1.30.10   yaml_2.2.0            reshape2_1.4.3        modelr_0.1.5          crosstalk_1.0.0       backports_1.1.5      
##  [49] httpuv_1.5.2          tools_3.6.3           logging_0.10-108      zCompositions_1.3.3-1 ellipsis_0.3.0        biomformat_1.12.0    
##  [55] BiocGenerics_0.30.0   hash_2.2.6.1          Rcpp_1.0.3            plyr_1.8.5            zlibbioc_1.30.0       ggpubr_0.2.4         
##  [61] pbapply_1.4-2         GetoptLong_0.1.8      S4Vectors_0.22.1      haven_2.2.0           ggrepel_0.8.1         cluster_2.1.0        
##  [67] fs_1.3.1              DECIPHER_2.12.0       magrittr_1.5          data.table_1.12.8     reprex_0.3.0          truncnorm_1.0-8      
##  [73] mvtnorm_1.0-12        sjmisc_2.8.3          hms_0.5.3             mime_0.8              evaluate_0.14         xtable_1.8-4         
##  [79] pbkrtest_0.4-7        sjstats_0.17.9        readxl_1.3.1          IRanges_2.18.3        shape_1.4.4           compiler_3.6.3       
##  [85] crayon_1.3.4          minqa_1.2.4           htmltools_0.4.0       rtk_0.2.5.8           mgcv_1.8-31           later_1.0.0          
##  [91] DataCombine_0.2.21    lubridate_1.7.4       DBI_1.1.0             sjlabelled_1.1.3      dbplyr_1.4.2          MASS_7.3-51.5        
##  [97] boot_1.3-24           ade4_1.7-13           getopt_1.20.3         permute_0.9-5         cli_2.0.1             quadprog_1.5-8       
## [103] parallel_3.6.3        insight_0.8.2         igraph_1.2.4.2        pkgconfig_2.0.3       rvcheck_0.1.7         numDeriv_2016.8-1.1  
## [109] plotly_4.9.1          xml2_1.2.2            foreach_1.4.7         ggtree_1.16.4         picante_1.8           multtest_2.40.0      
## [115] XVector_0.24.0        estimability_1.3      rvest_0.3.5           NADA_1.6-1            digest_0.6.23         parameters_0.6.0     
## [121] vegan_2.5-6           Biostrings_2.52.0     rmarkdown_2.1         cellranger_1.1.0      fastmatch_1.1-0       tidytree_0.3.1       
## [127] shiny_1.4.0           rjson_0.2.20          nloptr_1.2.1          lifecycle_0.1.0       nlme_3.1-144          jsonlite_1.6         
## [133] Rhdf5lib_1.6.3        viridisLite_0.3.0     fansi_0.4.1           pillar_1.4.3          lattice_0.20-38       fastmap_1.0.1        
## [139] httr_1.4.1            survival_3.1-8        glue_1.3.1            bayestestR_0.5.2      png_0.1-7             iterators_1.0.12     
## [145] bit_1.1-15.1          stringi_1.4.5         performance_0.4.4     blob_1.2.1            memoise_1.1.0         ape_5.3